Improving the Front-End Noise Preprocessor of MELPe

نویسندگان

  • Xin Lei
  • Mari Ostendorf
  • Lane Owsley
چکیده

In this paper we focus on improving the noise preprocessor (NPP) of the low-rate speech coder MELPe using information from the non-acoustic General Electromagnetic Motion Sensor (GEMS). A generalized linear model approach is proposed to improve the voice activity estimation both in the frame-level time domain and in the bin-level frequency domain with GEMS and context features. HMM based speech recognition techniques are also investigated to drive the estimators. The improved voice activity parameter estimators are shown to have significantly less error than the estimates from MELPe NPP. The improved frame-level voice activity estimator achieves 66% reduction in error. The improved bin-level voice activity estimates has more than 50% error reduction. With an optimal spectral amplitude estimation algorithm instead of the MM-LSA algorithm used in MELPe NPP, and the improved voice activity parameters, the processed noisy speech has much less residue noise and higher intelligibility in informal listening tests.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

RFC 8130 RTP Payload Format for MELPe Codec March 2017

This document describes the RTP payload format for the Mixed Excitation Linear Prediction Enhanced (MELPe) speech coder. MELPe’s three different speech encoding rates and sample frame sizes are supported. Comfort noise procedures and packet loss concealment are described in detail.

متن کامل

Dual-microphone Robust Front-end for Arm’s-length Speech Recognition

This paper describes a novel method of improving the performance of a speech recognition front-end in non-stationary background noise. A two-microphone array has been designed that both enhances the speech and provides a continuous estimate of the background noise. This processing has been integrated with the standard ETSI DSR Advanced Front End so that the continuous noise estimate is an input...

متن کامل

Improving the noise and spectral robustness of an isolated-word recognizer using an auditory-model front end

In this study, the performance of an auditory-model featureextraction “front end” was assessed in an isolated-word speech recognition task using a common hidden Markov model (HMM) “back end”, and compared with the performance of other feature representation front-end methods including mel-frequency cepstral coefficients (MFCC) and two variants (Jand L-) of the relative spectral amplitude (RASTA...

متن کامل

A psychoacoustical model of the auditory periphery as front end for ASR

The application of a psychoacoustical model of the auditory periphery in the field of automatic speech recognition (ASR) is presented. The model was developed to quantitatively predict human performance in typical spectral and temporal masking experiments. Speaker-independent, isolated-digit recognition experiments in different types of noise were carried out to evaluate the robustness of the a...

متن کامل

Is speech enhancement pre-processing still relevant when using deep neural networks for acoustic modeling?

Using deep neural networks (DNNs) for automatic speech recognition (ASR) has recently attracted much attention due to the large performance improvement they provide for a variety of tasks. DNNs are known to be robust to overfitting and to be able to remove speaker variability. Another important cause of variability in speech is the presence of noise. A lot of research has been undertaken on noi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004